Goto

Collaborating Authors

 unseen example



First of all, we wish to sincerely thank the anonymous reviewers for their time and efforts in reviewing our NeurIPS

Neural Information Processing Systems

In the revised version, we will make this clearer in the "Related Work" section. Figure 2 illustrates how the classification model (i.e. ( t 1) The parameter σ corresponds to the width of Gaussian kernel, which is fixed to be 1 in this paper (pp.3, footnote 1).


Random Features Hopfield Networks generalize retrieval to previously unseen examples

Kalaj, Silvio, Lauditi, Clarissa, Perugini, Gabriele, Lucibello, Carlo, Malatesta, Enrico M., Negri, Matteo

arXiv.org Artificial Intelligence

It has been recently shown that a learning transition happens when a Hopfield Network stores examples generated as superpositions of random features, where new attractors corresponding to such features appear in the model. In this work we reveal that the network also develops attractors corresponding to previously unseen examples generated with the same set of features. We explain this surprising behaviour in terms of spurious states of the learned features: we argue that, increasing the number of stored examples beyond the learning transition, the model also learns to mix the features to represent both stored and previously unseen examples. We support this claim with the computation of the phase diagram of the model.


#IJCAI2023 distinguished paper: Interview with Maurice Funk – knowledge bases and querying

AIHub

Maurice Funk, and co-authors Balder ten Cate, Jean Christoph Jung and Carsten Lutz, won a distinguished paper award at the 32nd International Joint Conference on Artificial Intelligence (IJCAI) for their work SAT-Based PAC Learning of Description Logic Concepts. In this interview, Maurice tells us more about knowledge bases and querying, why this is an interesting area for study, and their methodology and results. Our research is in the area of knowledge representation, or more specifically knowledge bases and querying. A knowledge base contains facts like a traditional database e.g. "Bob is a fish" and "Amelia is a dog", but also background knowledge formulated in some formal language e.g.


Membership inference attacks detect data used to train machine learning models

#artificialintelligence

One of the wonders of machine learning is that it turns any kind of data into mathematical equations. Once you train a machine learning model on training examples--whether it's on images, audio, raw text, or tabular data--what you get is a set of numerical parameters. In most cases, the model no longer needs the training dataset and uses the tuned parameters to map new and unseen examples to categories or value predictions. You can then discard the training data and publish the model on GitHub or run it on your own servers without worrying about storing or distributing sensitive information contained in the training dataset. But a type of attack called "membership inference" makes it possible to detect the data used to train a machine learning model.


A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

Warmuth, Manfred K., Kotłowski, Wojciech, Amid, Ehsan

arXiv.org Machine Learning

It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We essentially prove this conjecture: We show that after receiving a random training set of size $k < d$, the expected square loss is still $1-\frac{k}{(d-1)}$. The only requirement needed is that the input layer is fully connected and the initial weight vectors of the input nodes are chosen from a rotation invariant distribution. Surprisingly the same type of problem can be solved drastically more efficient by a simple 2-layer linear neural network in which the $d$ inputs are connected to the output node by chains of length 2 (Now the input layer has only one edge per input). When such a network is trained by gradient descent, then it has been shown that its expected square loss is $\frac{\log d}{k}$. Our lower bounds essentially show that a sparse input layer is needed to sample efficiently learn sparse targets with gradient descent when the number of examples is less than the number of input features.